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Abstract. Given a finite set of linearly independent quantum states, an observer who 
examines a single quantum system may sometimes identify its state with certainty How- 
ever, unless these quantum states are orthogonal, there is a finite probability of failure. 
A complete solution is given to the problem of optimal distinction of three states, having 
arbitrary prior probabilities and arbitrary detection values. A generalization to more than 
three states is outlined. 
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1. Non-orthogonal quantum signals 

Quantum information theory is an emerging science, which combines two traditional dis- 
ciplines: quantum mechanics and classical information theory. This subject has many 
fascinating potential applications for the transmission and processing of information, and 
yields results that cannot be achieved by classical means. A simple example is the use 
of quanta that have been prepared according to one of a finite set of states as signals for 
the transmission of information. The possibility of using non-orthogonal quantum states, 
which has no classical analogue, is especially interesting for its potential applications to 
cryptography (that is, for communication security) [1]. 

An observer, faced with such a set of signals whose prior probabilities are known, may 
follow various strategies. The approach favored by information theorists is to maximize 
the mutual information that can be acquired in the detection process [2]: each event is 
analyzed in a way from which it is possible to deduce definite posterior probabilities for the 
emission of the various signals, and the observer's aim is to reduce as much as possible the 
Shannon entropy of the ensemble of signals. On the other hand, communication engineers 
attempt to guess what the signal actually was, and their aim is to miminize the number of 
errors [3] . Cryptographers, whose supply of signals is essentially unlimited but for whom 
security is paramount, do not want any error at all, but on the other hand they are ready 
to lose some fraction of the signals. The latter strategy is the one that will be investigated 
in this article. 

The case of just two non-orthogonal signals is quite simple and well known [4-6]. 
Recently, Chefles [7] investigated the case of N linearly independent signals, and obtained 
some partial results. In the following, we give a complete treatment of the case of three 
signals. Our method can readily be generalized to a larger number of signals (but explicit 
calculations become tedious). 

In the next section, we introduce a set of positive operator valued measures which 
describe generalized quantum measurements. (These are more general than the projection 
valued measures corresponding to the standard, von Neumann type of mesurement.) An 
explicit algorithm is developed, to ensure the positivity of the required matrices. 

Optimization (namely, how to maximize the information gain) is discussed in Sect. 3. 
We consider the possibility that the various signals may have different "values." The 
information gain is defined as the expected average of the values of detected signals (this 
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includes the possibility that some types of signals are never identified). It is then shown in 
Sect. 4 that even if a measurement fails to identify with certainty a signal, it still is usually 
possible to attribute to the various signals posterior probabilities, so that the observer 
acquires at least some mutual information on the emitted signals. Finally, Sect. 5 briefly 
discusses an extension of this work to spaces with more than three dimensions. 

2. Positive operator valued measures 

Consider, in a 3-dimensional complex vector space, three linearly independent normalized 
state vectors, u 1; u 2 , and u 3 (we are using here the standard notation for Euclidean 
vectors, as no confusion may arise). These vectors have the physical meaning of signals, 
and they are, in general, not orthogonal. They occur with probabilities p±, p 2 , and ps, 
respectively. In each measurement the observer should either identify with certainty one 
of these signals, or get an inconclusive answer (the latter will be labelled 0, meaning "no 
answer"). The objective is to design a procedure that minimizes the probability of the 
inconclusive answer. More generally, we may attribute different values Cj to the various 
outcomes (for example, rare signals with small pj may have larger values than frequent 
signals), and our aim is to maximize the expected gain of information. 

Note that the number of outcomes of the measuring process is larger than the di- 
mensionality of the vector space. Therefore we need "generalized measurements" that 
are represented by positive operator valued measures (POVM) [8]. Namely, we have to 
construct four positive semi-definite matrices Aj, that satisfy 



where 1 is the unit matrix. Three of these matrices correspond to the three input signals, 
and the remaining one to an inconclusive answer. It is easily proved [2] that optimal Aj 
may be taken as matrices of rank 1. However, the optimal solution may not be unique, 
and higher rank matrices may also be optimal, as we shall see below. 

By analogy with the well known solution for the case of two input vectors [4-6] , let us 
define three auxiliary (unnormalized) vectors v 3 - as follows: 

Vi = u 2 x u 3 , (2) 

and cyclic permutations. We thus have 

(uj, = Sji [uiu 2 u 3 ], (3) 
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where [u!U 2 u 3 ] stands for the triple product of the input vectors (that is, the determinant 
of their components, in any basis). 

We then construct with the Vj three POVM matrices, which correspond to outcomes 
of experiments that give a definite identification of an input signal: 

A j = kj\vj)(vj\, ( 4 ) 

where the kj are non-negative numbers, that still have to be determined. Indeed, the 
probability that the j-th outcome results from the i-th input is 

Pj = (iii, Aj iii) = kj | (iii, vj) | 2 . (5) 

This vanishes if j ^ i. Therefore, observing the j-th outcome implies that the input was 
Uj. This result occurs with probability 

2 

[uiu 2 u 3 ] . (6) 



Note that the input states Uj must be linearly independent in order to unambiguously 
distinguish any one of them. It will be convenient for future use to introduce the notation 

T= |[u lU2 u 3 ]| 2 . (7) 
This can also be written as T = [viv 2 v 3 ], or 

T = 1 + si 2 s 23 s 3 i + s 13 s 32 s 2 i - \s 12 \ 2 - |s 23 | 2 - |s 3 i| 2 , (8) 
where = (u^ Uj). 

Finally, the remaining POVM matrix, which indicates an inconclusive answer, is given 

by 

Ao = l-EA,. (9) 

j'=i 

The probability of the inconclusive answer is 

3 3 

P = E Pj ("j, A o u,-> = 1 - T E kj Pj . (10) 

j=l 3=1 

We naturally want the kj to be as large as possible, in order to increase the detection 
probabilities, but their values are bounded above by the demand of positivity of A . Recall 



that the necessary and sufficient conditions for the positivity of a matrix are the positivity 
of all the diagonal elements and diagonal subdeterminants, including the determinant of 
the entire matrix: 

detA >0. (11) 

In the present case, this last condition is the decisive one that actually determines the 
domain of acceptable values of kj. This is intuitively seen as follows: when all kj vanish, 
Aq = 1, which has only positive eigenvalues. As we gradually increase the kj, one of 
the eigenvalues of Ao will vanish and then become negative. When it vanishes, the 
determinant vanishes too (because it is equal to the product of eigenvalues), and this 
gives the boundary of the domain of legal kj. The surface det(A ) = consists of several 
disjoint parts. The role of other positivity conditions is to eliminate (in practice, to 
confirm the elimination of) the irrelevant parts of that surface. 
Explicitly, the condition det(A ) = can be written as 

3 

1 " E l v / k J + T ( fc i fc 2 + hk 3 + hh) - T 2 ktk 2 k 3 = 0. (12) 

3=1 

A simple way of obtaining Eq. (12) is to choose a basis in our vector space, such that the 
vector components are as simple as possible. Let the first basis vector be Ui itself, and the 
second one be a linear combination of u x and u 2 , with real coefficients. This determines 
the third basis vector, up to a phase. We can choose phases so that 113 has at most one 
complex coefficient. We thus obtain 

ui = (1,0,0), u 2 = (03,62,0), u 3 = (a 3 ,6 3 e i/3 ,c 3 ). (13) 

Recall that all these vectors are normalized. It is now easy to write det(A ) explicitly 
in terms of the parameters in Eq. (|I3*D, and then to express these parameters in terms 
of the various vectors. The resulting surface, det(A ) = 0, is sketched in Fig. 1, for the 
following choice of parameters: 

ux = (1, 0, 0), u 2 = (0.6, 0.8, 0), u 3 = (0.5, 0.5 + 0.5i, 0.5). (14) 

The surface given by Eq. ( |T2"D intersects each kj axis at kj = |vj|~ 2 . Note that, in 
the first octant, this surface is everywhere convex. This can be seen as follows. Let us 
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cut it by one of the planes kj = const. The intersection is a rectangular hyperbola with 
asymptotes parallel to the remaining axes. For example, if we cut the surface (O) by the 
plane k 3 = const., the asymptote k\ — > oo is explicitly obtained by dividing Eq. fll2|) by 
k\ and then setting k\ — > oo. This gives 

- | Vl | 2 + T (A; 2 + A: 3 ) - T 2 A: 2 fc 3 = 0. (15) 

It is then easily seen that for any fixed k^ such that < k% < |v 3 |~ 2 , the resulting k 2 
is positive. This means that, in the plane k$ = const., the asymptote k\ —>■ oo cuts the 
positive part of the ki axis. The same result holds for any other choice of section parallel 
to one of the coordinate planes. This proves the convexity of the surface in Fig. 1: all 
these sections are convex segments of rectangular hyperbolas. 

3. Optimization 

Finally, we are left with the problem of finding the set of kj that maximize the infor- 
mation gain. The latter is 

g E c j p j (is) 

where Cj is the "value" of signal uj and use was made of Eq. (|5|). Define, for brevity, 

B } = C iVi . (17) 
All points of the plane 

y £B j k j = G/T, (18) 

with kj > 0, lead to the same information gain G, provided that these points belong to 
the domain of positivity of Ao- The largest value of G can be obtained as follows. 

Let us imagine that we start with a plane J^Bjkj = X, with large positive X, so that 
there is no contact between that plane and the relevant part of the surface (0). As we 
gradually decrease X, the plane will reach a point where it is tangent to that surface 
(thanks to its convexity). This happens at the point where the gradient of the left hand 
side of (fP2D is parallel to the vector {Bj}. If the point of contact lies in the first octant, 
it gives the optimal solution. It may happen, however, that at this point of contact one 
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of the kj is negative, and therefore that point is not a valid solution. In that case, we 
further decrease X, until a contact point occurs on one of the coordinate planes (that is, 
one of the kj vanishes) , or even at one of the vertices (two of them vanish) . 

For example, when all pj = |, and all Cj = 1, the optimal result is obtained when 
k\ = 2.4189, k<i = 0, and k% = 0.6719. This result means that we sacrifice the possibility 
of detecting signal u 2 in order to get the lowest probability for the inconclusive answer, 
as may be seen from Eq. ([I0|). In the present case, we obtain Pq = 0.8386. On the other 
hand, if we give different values to the signals, such as C\ = 0.8, C 2 = 1.2, and C3 = 1, the 
optimal result is obtained with k\ = 2.083, k 2 = 0.2902, and k^ = 0.2129. The probability 
to get an inconclusive answer then is slightly higher: P = 0.8626. 

4. Inconclusive answers still carry some information 

An inconclusive answer is not completely useless (except in special, highly symmetric 
cases). For example, if u x is orthogonal to u 2 and u 3 , and these are not orthogonal to 
each other, then vi is parallel to Ui, and v 2 and V3 lie in the U2U3 plane. The Ao matrix 
is of rank 1: Ao = |w)(w|, with w in the U2U3 plane. In such a case, the signal ui is 
always detected with certainty, while an inconclusive result means: either u 2 or u 3 (with 
known posterior probabilities, as explained below). 

In general, for arbitrary u,, the optimal Ao is a matrix of rank 2 which can be written 
in terms of its eigenvalues and eigenvectors: 

A = X m |m)(m| + X n |n)(n|. (19) 

Each one of the two terms on the right hand side is by itself a legitimate POVM element, 
so that there can actually be two distinct inconclusive outcomes. Let us label them m 
and n. 

Suppose that the outcome of a generalized measurement turns out to be m. The prior 
probability for that result, if the input was Uj, is 

Pmj = Pj A m I (m, Uj)\ 2 . (20) 

By Bayes's theorem, the posterior probability for input Uj upon observing output m is [8] 

3 

Qjm Pmj J } y Pmi ; (21 ) 

i=l 



The observer's final ignorance level, after receiving output m, is given by the Shannon 
entropy, 

3 

H m ^ Qjm If- Qjm- (^^) 

This need not be, but often is, less than the initial entropy, 

3 

Hmit = -J^Pj lll Pv ( 23 ) 

so that some information has been gained, even though the result is inconclusive. 
5. Higher dimensional space 

Finally, let us briefly outline how the above results can be generalized to N signals (N > 
3). Consider the iV-th order matrix formed by the components of all the input vectors, 
in any basis. Instead of the triple product [U1U2U3], we now have the determinant of 
that matrix. Vector products Vj such as in Eq. (0) become outer products of any N — 1 
signal states. Their components, in any basis, are the appropriate cofactors in the above 
determinant. The argument leading to Eq. (|12|) remains essentially the same, and we now 
obtain a (N — l)-dimensional hypersurface in the iV-dimensional /c-space. It is plausible 
that this hypersurface is convex in the first orthant (i.e., hyper-octant) in /c-space. A 
formal proof of this conjecture is a straightforward but tedious exercise in differential 
geometry (perhaps a more clever proof can be found). Optimization then proceeds as in 
Sect. 3, by considering a family of parallel hyperplanes J^Bjkj = X. 

There are now many possibilities of partial answers. For example, if the signal states Uj 
can be divided into two (or more) mutually orthogonal subspaces, it is possible, in a first 
step, to determine unambiguously the subspace to which each signal belongs. Then, a 
second step is to try to identify individual non-orthogonal signals within a given subspace. 

An interesting problem is how to utilize the resulting mixed information, with some of 
the signals fully identified, and others only partly identified. For example, if we have two 
mutually orthogonal subspaces, and in each one two non-orthogonal states, an individual 
state encodes two bits, but a subspace is still worth one bit, plus some amount of mutual 
(probabilistic) information. Further investigation is needed to clarify this issue. 
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Figure 1. Domain of positivity of Ao- 
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